Transitivity and the Co-occurrence Relation in LSI
نویسندگان
چکیده
Current research in Latent Semantic Indexing (LSI) shows improvements in performance for a wide variety of Information Retrieval systems. Researchers use experimental methods to determine the appropriate number of dimensions for a given application. We propose the development of a theoretical foundation for determination of this parameter for LSI. We assert that LSI’s use of higher orders of co-occurrence is critical to this optimization function. In this work we present experiments that precisely determine the degree of transitivity used in LSI. We empirically demonstrate that LSI uses up to fourth order term co-occurrence. We also prove mathematically that a transitivity path exists for every nonzero element in the truncated term-term matrix computed by LSI. A complete understanding of the degree of transitivity will be key to understanding how a reflexive, symmetric and transitive relation based on the co-occurrence relation can form semantic equivalence classes for a collection.
منابع مشابه
A Mathematical View of Latent Semantic Indexing: Tracing Term Co-occurrences
Current research in Latent Semantic Indexing (LSI) shows improvements in performance for a wide variety of information retrieval systems. We propose the development of a theoretical foundation for understanding the values produced in the reduced form of the term-term matrix. We assert that LSI’s use of higher orders of co-occurrence is a critical component of this study. In this work we present...
متن کاملDetecting Patterns in the LSI Term-Term Matrix
Higher order co-occurrences play a key role in the effectiveness of systems used for text mining. A wide variety of applications use techniques that explicitly or implicitly employ a limited degree of transitivity in the co-occurrence relation. In this work we show use of higher orders of co-occurrence in the Singular Value Decomposition (SVD) algorithm and, by inference, on the systems that re...
متن کاملA framework for understanding Latent Semantic Indexing (LSI) performance
In this paper we present a theoretical model for understanding the performance of Latent Semantic Indexing (LSI) search and retrieval applications. Many models for understanding LSI have been proposed. Ours is the first to study the values produced by LSI in the term dimension vectors. The framework presented here is based on term co-occurrence data. We show a strong correlation between second ...
متن کاملA Framework for Understanding LSI Performance
In this paper we present a theoretical model for understanding the performance of LSI search and retrieval applications. Many models for understanding LSI have been proposed. Ours is the first to study the values produced by LSI in the term dimension vectors. The framework presented here is based on term co-occurrence data. We show a strong correlation between second order term co-occurrence an...
متن کاملEvaluation of Co-occurring Terms in Clinical Documents Using Latent Semantic Indexing
OBJECTIVES Measurement of similarities between documents is typically influenced by the sparseness of the term-document matrix employed. Latent semantic indexing (LSI) may improve the results of this type of analysis. METHODS In this study, LSI was utilized in an attempt to reduce the term vector space of clinical documents and newspaper editorials. RESULTS After applying LSI, document simi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002